1 Research Question

The aim of this analysis is to investigate diabetes prevalence over time. The analysis looks at the variables year, country and sex to identify groups with high diabetes prevalence.

2 Dataset Introduction

The dataset ‘DIABETES evolution of diabetes over time’ is a global dataset of diabetes prevelance from the years 1980 to 2014 and contains a total of 14,000 observations and 7 variables:

Table 2.1 below shows the first six observations of the full dataset.

# Read in Data 
data_full <- read_csv("Data/Diabetes_data.csv")

# create variable of for first observations
data_full_head <- head(data_full)

# display in table
kable(data_full_head, 
             caption = "First Six Observations of the Full Diabetes Dataset",
             digits = 2)
Table 2.1: First Six Observations of the Full Diabetes Dataset
Country/Region/World ISO Sex Year Age-standardised diabetes prevalence Lower 95% uncertainty interval Upper 95% uncertainty interval
Afghanistan AFG Men 1980 0.04 0.02 0.09
Afghanistan AFG Men 1981 0.05 0.02 0.09
Afghanistan AFG Men 1982 0.05 0.02 0.09
Afghanistan AFG Men 1983 0.05 0.02 0.09
Afghanistan AFG Men 1984 0.05 0.02 0.09
Afghanistan AFG Men 1985 0.05 0.02 0.09

3 Dataset Description

The full dataset was reduced to 1000 observations through a random generation of row numbers. The variable “ISO” was removed as it was not necessary for analysis. The reduced data has 6 variables (Although the limit is 5 variables, I considered the lower and upper 95% confidence interval variables as one variable). Figure 3.1 below shows the code used to tidy the full dataset into the reduced dataset.

include_graphics("Image/code_screenshot.png")
Code Screenshot of Data Tidying

Figure 3.1: Code Screenshot of Data Tidying

Using the function str() the first 2 rows of the data is displayed to show the type of variables in the data set (numeric, character/factor etc.).

# first display only first 2 rows
head_data_2 <- head(data,2)

str(head_data_2)
## tibble [2 × 6] (S3: tbl_df/tbl/data.frame)
##  $ Country/Region/World: chr [1:2] "Micronesia (Federated States of)" "Pakistan"
##  $ Sex                 : chr [1:2] "Men" "Women"
##  $ Year                : num [1:2] 2013 1982
##  $ diabetes_prevalence : num [1:2] 0.2003 0.0612
##  $ lower_95            : num [1:2] 0.124 0.028
##  $ upper_95            : num [1:2] 0.293 0.115

4 Data Summary

Mean and standard deviation were calculated for diabetes prevalence by “Year”. Table 4.1 shows the results of the summary statistics. This section requires grouping by a factor/character variable. ‘Year’ is a numerical variable but was chosen here to better reflect the research question.

# group data by year and create summary statistics
data_summary <- data %>%
  group_by(Year) %>%
  summarise(mean_diabetes = mean(diabetes_prevalence), 
            sd_diabetes = sd(diabetes_prevalence), 
            mean_upper95 = mean(upper_95), 
            sd_upper95 = sd(upper_95), 
            mean_lower95 = sd(lower_95), 
            sd_lower95 = sd(lower_95))
# display only 10 observations (latest years)
tail_data_summary <- tail(data_summary, 10)

# create table
kable(tail_data_summary, 
             caption = "Mean and Standard Deviation of Diabetes Prevalence by Year (First 10 Rows)",
             digits = 3, 
      row_number(10))
Table 4.1: Mean and Standard Deviation of Diabetes Prevalence by Year (First 10 Rows)
Year mean_diabetes sd_diabetes mean_upper95 sd_upper95 mean_lower95 sd_lower95
2005 0.083 0.033 0.117 0.042 0.025 0.025
2006 0.085 0.046 0.120 0.058 0.036 0.036
2007 0.095 0.050 0.132 0.063 0.038 0.038
2008 0.092 0.037 0.129 0.046 0.031 0.031
2009 0.078 0.026 0.116 0.035 0.020 0.020
2010 0.083 0.035 0.125 0.048 0.026 0.026
2011 0.089 0.050 0.131 0.065 0.037 0.037
2012 0.104 0.061 0.156 0.082 0.043 0.043
2013 0.105 0.058 0.163 0.081 0.038 0.038
2014 0.083 0.040 0.135 0.058 0.026 0.026

From Table 4.1 we can see an increasing trend in mean diabetes prevalence from 2005 to 2014. 2009 had the highest mean diabetes prevalence at 11.1% from the period 2005 to 2014, but also the highest standard deviation.

5 Visualisations

5.1 Diabetes Prevalence Over Time

A figure was created using the ggplot2 R package and the option geom_point(). This is displayed in Figure 5.1.1:

Figure_2 <- ggplot(data = data_summary, aes(x = Year, y = mean_diabetes)) + 
  geom_point(alpha = 0.7) + 
  xlab("Year") + 
  ylab("Mean Diabetes Prevalence") + 
  theme_minimal() + 
  geom_smooth() + 
  geom_errorbar(aes(ymin=mean_diabetes-sd_diabetes, ymax=mean_diabetes+sd_diabetes), colour="red", alpha=0.3)

ggplotly(Figure_2)

Figure 5.1: Mean Diabetes Prevalence Increases Over Time

  • There is a slight increase in mean diabetes prevalence from 1980 to 2014
  • The standard deviation bars indicate a high dispersion of data

5.2 Australian Diabates Trends By Sex

# first filter for australian data
Australia_summary <- data_full %>%
  filter(`Country/Region/World` == "Australia")

Figure_3 <- ggplot(data = Australia_summary, aes(x = Year, y = `Age-standardised diabetes prevalence`, col = Sex)) + 
  geom_point(alpha = 0.8) + 
  xlab("Year") + 
  ylab("Mean Diabetes Prevalence") + 
  theme_minimal() + 
  geom_smooth()

Figure_3
Men have Higher Risk of Diabetes

Figure 5.2: Men have Higher Risk of Diabetes

Figure 5.2.1 shows a trend of increasing mean diabates prevalence over time. Men have a noticeably higher mean than women. There is a steep increase from 1980 to 2000 and then a plateau. Data was only available up to 2014. It is unknown whether the plateua begins to trend downwards.

5.3 Mean Diabetes Prevalence by Country

# filter for five countries and group by sex 
Australia_table_summary <- data_full %>%
  filter(`Country/Region/World` %in% c("Australia", "Germany", "China", "South Africa", "United States of America")) %>%
  select(-ISO) %>%
  group_by(`Country/Region/World`, Sex) %>%
  summarise(`Mean diabetes prevalence` = mean(`Age-standardised diabetes prevalence`))
  
kable(Australia_table_summary, 
             caption = "First Six Observations of the Full Diabetes Dataset",
             digits = 3)
Table 5.1: First Six Observations of the Full Diabetes Dataset
Country/Region/World Sex Mean diabetes prevalence
Australia Men 0.064
Australia Women 0.047
China Men 0.060
China Women 0.061
Germany Men 0.056
Germany Women 0.040
South Africa Men 0.069
South Africa Women 0.097
United States of America Men 0.065
United States of America Women 0.054

Five random countries were selected to compare mean diabetes prevalence by year and sex. In Australia, Germany and United States of America, men have a higher mean diabetes prevalence than women. Mean diabetes prevalence for men and women in China are very similar with men being 0.001 higher. Interestingly, women in South Africa have a higher mean diabetes prevalence than men.

6 Conclusions